Search CORE

46 research outputs found

PPLook: an automated data mining tool for protein-protein interaction

Author: A Chatr-aryamontri
BJ Breitkreutz
BJ Stapley
C Blaschke
C Friedman
D Braga
D Shreiner
D Zhou
JH Eom
JM Fernández
JM Temkin
JW Cooper
L Hermjakob
L Salwinski
Li Xia
MP Marcus
N Daraselia
Quan Pan
RS Wright
S Chernov
S Kim
Shao-Wu Zhang
T Ohta
T Ono
Y Tsuruoka
Y Tsuruoka
Yao-Jun Li
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Extracting and visualizing of protein-protein interaction (PPI) from text literatures are a meaningful topic in protein science. It assists the identification of interactions among proteins. There is a lack of tools to extract PPI, visualize and classify the results. Results We developed a PPI search system, termed PPLook, which automatically extracts and visualizes protein-protein interaction (PPI) from text. Given a query protein name, PPLook can search a dataset for other proteins interacting with it by using a keywords dictionary pattern-matching algorithm, and display the topological parameters, such as the number of nodes, edges, and connected components. The visualization component of PPLook enables us to view the interaction relationship among the proteins in a three-dimensional space based on the OpenGL graphics interface technology. PPLook can also provide the functions of selecting protein semantic class, counting the number of semantic class proteins which interact with query protein, counting the literature number of articles appearing the interaction relationship about the query protein. Moreover, PPLook provides heterogeneous search and a user-friendly graphical interface. Conclusions PPLook is an effective tool for biologists and biosystem developers who need to access PPI information from the literature. PPLook is freely available for non-commercial users at <url>http://meta.usc.edu/softs/PPLook</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb

Author: A Stark
Antonio Jimeno-Yepes
BJ Polacco
BJ Stapley
C Blaschke
C Blaschke
C Friedman
CH Wu
CJO Baker
CJO Baker
D Bourigault
D Rebholz-Schuhmann
D Rebholz-Schuhmann
Dietrich Rebholz-Schuhmann
DL Wheeler
DM Kristensen
EM Marcotte
F Cerbah
F Guenthner
F Horn
G Leroy
JA Barker
JC Nebel
Kevin Nagel
LC Lee
M Ikeda
MM Babu
P Pezik
R Kanagasabai
R Witte
S Gaudan
S Yoon
TJ Oldfield
Y Miyao
Y Tateisi
Y Tsuruoka
YL Yip
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background A protein annotation database, such as the Universal Protein Resource knowledge base (UniProtKb), is a valuable resource for the validation and interpretation of predicted 3D structure patterns in proteins. Existing studies have focussed on point mutation extraction methods from biomedical literature which can be used to support the time consuming work of manual database curation. However, these methods were limited to point mutation extraction and do not extract features for the annotation of proteins at the residue level. Results This work introduces a system that identifies protein residues in MEDLINE abstracts and annotates them with features extracted from the context written in the surrounding text. MEDLINE abstract texts have been processed to identify protein mentions in combination with taxonomic species and protein residues (F1-measure 0.52). The identified protein-species-residue triplets have been validated and benchmarked against reference data resources (UniProtKb, average F1-measure of 0.54). Then, contextual features were extracted through shallow and deep parsing and the features have been classified into predefined categories (F1-measure ranges from 0.15 to 0.67). Furthermore, the feature sets have been aligned with annotation types in UniProtKb to assess the relevance of the annotations for ongoing curation projects. Altogether, the annotations have been assessed automatically and manually against reference data resources. Conclusion This work proposes a solution for the automatic extraction of functional annotation for protein residues from biomedical articles. The presented approach is an extension to other existing systems in that a wider range of residue entities are considered and that features of residues are extracted as annotations.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

mspecLINE: bridging knowledge of human disease with the proteome

Author: AM Cohen
B Ye
BJ Stapley
BT Alako
C Bennett
CC van der Eijk
DJ Slotta
E Keogh
Eric W Deutsch
EW Deutsch
F Desiere
H Liao
H Liu
HJ Lowe
J Boyle
J Saltz
Jeremy Handcock
John Boyle
M Li
M Li
M Li
MY Brusniak
P Khatri
P Mallick
P Picotti
P Shannon
PA Covitz
R Cilibrasi
R Cilibrasi
R Homayouni
RL Cilibrasi
S Deerwester
V Lange
Y Tsuruoka
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Public proteomics databases such as PeptideAtlas contain peptides and proteins identified in mass spectrometry experiments. However, these databases lack information about human disease for researchers studying disease-related proteins. We have developed mspecLINE, a tool that combines knowledge about human disease in MEDLINE with empirical data about the detectable human proteome in PeptideAtlas. mspecLINE associates diseases with proteins by calculating the semantic distance between annotated terms from a controlled biomedical vocabulary. We used an established semantic distance measure that is based on the co-occurrence of disease and protein terms in the MEDLINE bibliographic database. Results The mspecLINE web application allows researchers to explore relationships between human diseases and parts of the proteome that are detectable using a mass spectrometer. Given a disease, the tool will display proteins and peptides from PeptideAtlas that may be associated with the disease. It will also display relevant literature from MEDLINE. Furthermore, mspecLINE allows researchers to select proteotypic peptides for specific protein targets in a mass spectrometry assay. Conclusions Although mspecLINE applies an information retrieval technique to the MEDLINE database, it is distinct from previous MEDLINE query tools in that it combines the knowledge expressed in scientific literature with empirical proteomics data. The tool provides valuable information about candidate protein targets to researchers studying human disease and is freely available on a public web server.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Using Unsupervised Patterns to Extract Gene Regulation Relationships for Network Construction

Author: A Ozgur
BJ Stapley
C Blaschke
C Nedellec
C Rodriguez-Penagos
CC van der Eijk
CF Schaefer
D Klein
D Klein
Dongxiao Zhu
E Buyko
Hei-Chia Wang
HM Muller
Hung-Yu Kao
J Saric
J Saric
JH Chiang
K Fundel
L Tanabe
M Huang
R Chowdhary
R Hoffmann
R Jelier
S Kim
S Pyysalo
Shaw-Jenq Tsai
Shuo-Jang Li
T Ono
TK Jenssen
U Hahn
Yi-Tsung Tang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

BACKGROUND: The gene expression is usually described in the literature as a transcription factor X that regulates the target gene Y. Previously, some studies discovered gene regulations by using information from the biomedical literature and most of them require effort of human annotators to build the training dataset. Moreover, the large amount of textual knowledge recorded in the biomedical literature grows very rapidly, and the creation of manual patterns from literatures becomes more difficult. There is an increasing need to automate the process of establishing patterns. METHODOLOGY/PRINCIPAL FINDINGS: In this article, we describe an unsupervised pattern generation method called AutoPat. It is a gene expression mining system that can generate unsupervised patterns automatically from a given set of seed patterns. The high scalability and low maintenance cost of the unsupervised patterns could help our system to extract gene expression from PubMed abstracts more precisely and effectively. CONCLUSIONS/SIGNIFICANCE: Experiments on several regulators show reasonable precision and recall rates which validate AutoPat's practical applicability. The conducted regulation networks could also be built precisely and effectively. The system in this study is available at http://ikmbio.csie.ncku.edu.tw/AutoPat/

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Identification and Analysis of Co-Occurrence Networks with NetCutter

Author: A Brazma
A Kel
B Snel
BJ Stapley
BP Berman
C Perez-Iratxeta
D Chaussabel
D Rebholz-Schuhmann
DL Wheeler
DR Masys
EM Marcotte
EM Marcotte
Francesco Mancuso
G Dennis Jr
G Finocchiaro
GW Flake
Heiko Müller
J Ding
JD Wren
Ji Zhu
L Tanabe
LA Goodman
M Bansal
M Girvan
M Markstein
M Pellegrini
MA Huynen
ME Newman
ME Newman
MJ Schuemie
MS Halfon
NR Smalheiser
P Sudarsanam
R Elkon
RE Tarjan
RL Tatusov
S Tavazoie
S Zhu
S Zhu
SA Jelinsky
SX Chen
T Manke
TC Rindflesch
TK Jenssen
WW Wasserman
Y Pilpel
Publication venue: Public Library of Science
Publication date: 10/09/2008
Field of study

BACKGROUND: Co-occurrence analysis is a technique often applied in text mining, comparative genomics, and promoter analysis. The methodologies and statistical models used to evaluate the significance of association between co-occurring entities are quite diverse, however. METHODOLOGY/PRINCIPAL FINDINGS: We present a general framework for co-occurrence analysis based on a bipartite graph representation of the data, a novel co-occurrence statistic, and software performing co-occurrence analysis as well as generation and analysis of co-occurrence networks. We show that the overall stringency of co-occurrence analysis depends critically on the choice of the null-model used to evaluate the significance of co-occurrence and find that random sampling from a complete permutation set of the bipartite graph permits co-occurrence analysis with optimal stringency. We show that the Poisson-binomial distribution is the most natural co-occurrence probability distribution when vertex degrees of the bipartite graph are variable, which is usually the case. Calculation of Poisson-binomial P-values is difficult, however. Therefore, we propose a fast bi-binomial approximation for calculation of P-values and show that this statistic is superior to other measures of association such as the Jaccard coefficient and the uncertainty coefficient. Furthermore, co-occurrence analysis of more than two entities can be performed using the same statistical model, which leads to increased signal-to-noise ratios, robustness towards noise, and the identification of implicit relationships between co-occurring entities. Using NetCutter, we identify a novel protein biosynthesis related set of genes that are frequently coordinately deregulated in human cancer related gene expression studies. NetCutter is available at http://bio.ifom-ieo-campus.it/NetCutter/). CONCLUSION: Our approach can be applied to any set of categorical data where co-occurrence analysis might reveal functional relationships such as clinical parameters associated with cancer subtypes or SNPs associated with disease phenotypes. The stringency of our approach is expected to offer an advantage in a variety of applications

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae

Background: Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations. Methodology/Principal Findings: We report a significantly improved version (v. 2) of a probabilistic functional gene network [1] of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis. Conclusions/Significance: YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org.This work was supported by grants from the N.S.F. (IIS-0325116, EIA-0219061), N.I.H. (GM06779-01,GM076536-01), Welch (F-1515), and a Packard Fellowship (EMM). These agencies were not involved in the design and conduct of the study, in the collection, analysis, and interpretation of the data, or in the preparation, review, or approval of the manuscript.Cellular and Molecular Biolog

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Texas ScholarWorks

In silico pathway reconstruction: Iron-sulfur cluster biogenesis in Saccharomyces cerevisiae

Author: A Ramazzotti
A Salvador
A Sorribas
A Sorribas
A Tovchigrechko
A Tovchigrechko
AC Adam
AJ Walhout
AL Bulteau
Albert Sorribas
B Contreras-Moreira
B Contreras-Moreira
B Contreras-Moreira
B Contreras-Moreira
B Contreras-Moreira
B Hernandez-Bermejo
B Hernandez-Bermejo
B Schilke
B Weiner
BJ Breitkreutz
BJ Stapley
BT Bui
BT Bui
C Francke
C Stark
C Voisine
CM Deane
D Chivian
D Chivian
DC Rees
DC Rees
DW Ritchie
EO Voit
EO Voit
F Barras
F Nikitin
F Vilella
G Duby
G Isaya
G Kispal
GD Bader
GR Smith
H Beinert
H Beinert
H Lange
H Li
H Nichol
H Ogata
H Ye
HD Urbina
I Halperin
I Xenarios
IA Vakser
IA Vakser
IA Vakser
J Frazzon
J Frazzon
J Frazzon
J Gerber
J Li
J Li
J Wu
K Aloria
K Chandramouli
K Nishio
KM Misura
L Manzella
L Salwinski
M Fontecave
M Nakao
M Pellegrini
M Pellegrini
MA Savageau
MA Savageau
MA Savageau
MH Barros
MT Rodriguez-Manzaneque
N Guex
N Guex
N Wiedemann
O Christensen
O Gakh
OS Chen
P Bradley
P Gonzalez-Cabo
P Ross-Macdonald
P Uetz
P Uetz
P Uetz
PA Bates
PA Bates
PD Karp
PD Karp
PD Karp
PJ Kiley
R Alves
R Alves
R Alves
R Alves
R Alves
R Alves
R Dutkiewicz
R Heinrich
R Hoffmann
R Hoffmann
R Hoffmann
R Lill
R Lill
R Lill
R Lill
Rui Alves
S McGinnis
S Park
S Vajda
T Ideker
T Ideker
T Lutz
T Schwede
T Schwede
TC Ni
TR Hazbun
U de Lichtenberg
U Muhlenhoff
V Irazusta
W Yang
Y He
Y Ho
Y Zheng
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Current advances in genomics, proteomics and other areas of molecular biology make the identification and reconstruction of novel pathways an emerging area of great interest. One such class of pathways is involved in the biogenesis of Iron-Sulfur Clusters (ISC). RESULTS: Our goal is the development of a new approach based on the use and combination of mathematical, theoretical and computational methods to identify the topology of a target network. In this approach, mathematical models play a central role for the evaluation of the alternative network structures that arise from literature data-mining, phylogenetic profiling, structural methods, and human curation. As a test case, we reconstruct the topology of the reaction and regulatory network for the mitochondrial ISC biogenesis pathway in S. cerevisiae. Predictions regarding how proteins act in ISC biogenesis are validated by comparison with published experimental results. For example, the predicted role of Arh1 and Yah1 and some of the interactions we predict for Grx5 both matches experimental evidence. A putative role for frataxin in directly regulating mitochondrial iron import is discarded from our analysis, which agrees with also published experimental results. Additionally, we propose a number of experiments for testing other predictions and further improve the identification of the network structure. CONCLUSION: We propose and apply an iterative in silico procedure for predictive reconstruction of the network topology of metabolic pathways. The procedure combines structural bioinformatics tools and mathematical modeling techniques that allow the reconstruction of biochemical networks. Using the Iron Sulfur cluster biogenesis in S. cerevisiae as a test case we indicate how this procedure can be used to analyze and validate the network model against experimental results. Critical evaluation of the obtained results through this procedure allows devising new wet lab experiments to confirm its predictions or provide alternative explanations for further improving the models

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Repositori Obert UdL

Transcript expression plasticity as a response to alternative larval host plants in the speciation process of corn and rice strains of Spodoptera frugiperda

Author: A Alexa
A Conesa
A Roy
Antonio Figueira
AP Moczek
AT Groot
B Chevreux
BJ Haas
C Koenig
C Mitter
Celso Omoto
Daniel Bernardi
DJ Funk
DP Pashley
DP Pashley
DP Pashley
DP Prowell
DW Pfennig
DW Whitman
EP Nawrocki
F Ortego
F Supek
F Yan
FC Wouters
G Glauser
GJ Kergoat
GJ Ragland
GR Busato
GR Busato
H Vogel
HA Orr
HD Rundle
HM Heidel-Fischer
HM Niemeyer
HM Niemeyer
I Eyres
J Stapley
JA Klun
JG Houseman
JH Zar
K Breddam
Karina Lucas Silva-Brandão
KD Hansen
KD Pruitt
KW Matsubayashi
L Y-J
LM Schoonhoven
M Dres
M Rostas
MA Beaumont
Marcelo Mendes Brandão
MD Celorio-Mancera
MD Robinson
MJ West-Eberhard
ML Juarez
ML Juarez
MR Kanost
ND Rawlings
OA Barski
P Dumas
P Nosil
PJ Waniek
R Feng
R Feyereisen
R Feyereisen
RD Finn
Renato Jun Horikoshi
RN Nagoshi
S Lima
S Mopper
S Via
S Via
SR Eddy
TP Souza
V Machado
WR Terra
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Assignment of PolyProline II Conformation and Analysis of Sequence – Structure Relationship

Author: A Bornot
A Kentsis
A Rath
AA Adzhubei
AA Adzhubei
AG de Brevern
AG de Brevern
AG de Brevern
AG de Brevern
AG de Brevern
AG de Brevern
Agnel Praveen Joseph
AK Jha
Alexandre G. de Brevern
AP Joseph
AP Joseph
AW Chan
B Hess
B Offmann
B Zagrovic
BJ Stapley
BK Kay
BW Chellgren
BW Chellgren
C Etchebest
CM Venkatachalam
CY Wu
D Eisenberg
D Frishman
D van der Spoel
DA Beck
E Lindahl
E Polverini
EJ Thompson
EW Blanch
F Avbelj
F Eker
FC Bernstein
FC Peterson
FM Richards
G Darnell
G Faure
G Faure
G Labesse
G Wang
G Wang
GB Banks
GD Rose
HJC Berendsen
HM Berman
J Esque
J Makowska
J Martin
J Martin
J Martin
JC Horng
JC Kendrew
Jean-Christophe Gelly
JM Hicks
JS Richardson
JS Richardson
K Chen
L Fourrier
L Pauling
L Pauling
L Pauling
L Pauling
LL Perskie
LL Porter
LR Rabiner
M Bansal
M Dudev
M Kuemin
M Mezei
M Tyagi
M Tyagi
M Tyagi
M Tyagi
M Tyagi
MA Kelly
Markus Buehler
MB Swindells
ML Tiffany
MV Cubellis
MV Cubellis
N Colloc'h
N Sreerama
NC Fitzkee
PK Vlasov
PL Obuchowski
PM Cowan
R Berisio
R Srinivasan
RV Pappu
S Arnott
S Jun
S Kutter
SA Hollingsworth
SJ Whittington
SM King
T Kameda
T Kohonen
TP Creamer
TP Creamer
V Sasisekharan
W Kabsch
WL Jorgensen
Y Watanabe
Yohann Mansiaux
Z Liu
Z Shi
Z Shi
Z Shi
Z Shi
Z Shi
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

International audienceBACKGROUND: Secondary structures are elements of great importance in structural biology, biochemistry and bioinformatics. They are broadly composed of two repetitive structures namely α-helices and β-sheets, apart from turns, and the rest is associated to coil. These repetitive secondary structures have specific and conserved biophysical and geometric properties. PolyProline II (PPII) helix is yet another interesting repetitive structure which is less frequent and not usually associated with stabilizing interactions. Recent studies have shown that PPII frequency is higher than expected, and they could have an important role in protein - protein interactions. METHODOLOGY/PRINCIPAL FINDINGS: A major factor that limits the study of PPII is that its assignment cannot be carried out with the most commonly used secondary structure assignment methods (SSAMs). The purpose of this work is to propose a PPII assignment methodology that can be defined in the frame of DSSP secondary structure assignment. Considering the ambiguity in PPII assignments by different methods, a consensus assignment strategy was utilized. To define the most consensual rule of PPII assignment, three SSAMs that can assign PPII, were compared and analyzed. The assignment rule was defined to have a maximum coverage of all assignments made by these SSAMs. Not many constraints were added to the assignment and only PPII helices of at least 2 residues length are defined. CONCLUSIONS/SIGNIFICANCE: The simple rules designed in this study for characterizing PPII conformation, lead to the assignment of 5% of all amino as PPII. Sequence - structure relationships associated with PPII, defined by the different SSAMs, underline few striking differences. A specific study of amino acid preferences in their N and C-cap regions was carried out as their solvent accessibility and contact patterns. Thus the assignment of PPII can be coupled with DSSP and thus opens a simple way for further analysis in this field

Public Library of Science (PLOS)

Crossref

HAL-Inserm

Directory of Open Access Journals

PubMed Central

HAL Descartes

Hal-Diderot